Performance Analysis of Parallel Applications Running on SMP
نویسندگان
چکیده
In this work, by using dynamic analysis techniques, we analyze how a workload can be accelerated in the case of a shared-bus shared-memory multiprocessor. It is well known that, in this kind of systems, the bus is the critical element that can limit the scalability of the machine. Nevertheless, many factors that influence bus utilization have not been yet investigated for this kind of workload, in particular the effects of thread migration. The operating system effects are also considered in our evaluation. We analyzed a basic four-processor and a high-end sixteen-processor machine, implementing three different coherence protocols (including MESI and another solution from the literature). We show that even in the fourprocessor case, the overhead induced by the sharing of private data, as a consequence of process migration, namely passive sharing, cannot be neglected. Indeed, the analysis shows that a protocol based on a selective strategy for dealing with private and shared data has a better performance than protocols either relying on the detection of migratory access-pattern or purely using a Write-Invalidate strategy, like MESI. We varied the architectural parameters to show how passive sharing and other coherence overhead are influenced by different cache choices. Then, we considered the sixteen-processor case, where the effects on performance are more evident. We also end up that performance can take advantage of large caches and cache affinity scheduling. However, even with affinity scheduling, a selective protocol delivers better performance.
منابع مشابه
Architectural Effects of Symmetric Multiprocessors on TPC-C Commercial Workload
Commercial transaction processing applications are an important workload running on symmetric multiprocessor systems (SMPs). They differ dramatically from scientific, numeric-intensive, and engineering applications because they are I O bound, and they contain more system software activities. Most SMP servers available in the market have been designed and optimized for scientific and engineering...
متن کاملIntone — Tools and Environments for OpenMP on Clusters of SMPs1
Clusters of small-scale SMP computers are becoming more and more common as high-performance computing needs have arised, not only in national scientific laboratories, but also in enterprises of various kinds. An SMP-cluster represents a sweet-spot of cost-efficiency compared to a larger SMP system or to a cluster with smaller nodes. With the emergance of OpenMP, shared memory computing has also...
متن کاملPerformance Analysis of PC-CLUMP based on SMP-Bus Utilization
PC-CLUMP (Cluster of Multiprocessor) is one of the most cost-e ective commodity-based platforms for HPC applications. The increasing number of CPUs per SMP node realizes very compact system size and very low price on the network interface per processor keeping the number of CPUs in the system. However, the performance of SMP-bus on such an SMPPC node is relatively poor compared with that of SMP...
متن کاملGroup-Based Performance Analysis for Multithreaded SMP Cluster Applications
Performance optimization remains one of the key issues in parallel computing. With the emergence of large clustered SMP systems, the task of analyzing and tuning scientific applications actually becomes harder. Tools need to be extended to cover both distributed and shared– memory styles of performance analysis and to handle the massive amount of information generated by applications on today’s...
متن کاملDevelopment of Solid Earth Simulation Platform Sparse Approximate Inverse Preconditioner for Contact Problems using OpenMP Project Representative
The three-level hybrid parallel programming model consisting of MPI, OpenMP and vectorization with multicolor-based reordering methods provides optimum performance on SMP cluster architectures with vector processors such as the Earth Simulator (ES) for finite-element type applications. While the three-level hybrid and flat MPI parallel programming models offer similar performance, the hybrid pr...
متن کاملSMPFRAME: A Distributed Framework for Scheduled Model Parallel Machine Learning
Machine learning (ML) problems commonly applied to big data by existing distributed systems share and update all ML model parameters at each machine using a partition of data — a strategy known as data-parallel. An alternative and complimentary strategy, model-parallel, partitions model parameters for non-shared parallel access and update, periodically repartitioning to facilitate communication...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001